-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactors JSON reader's pushdown automaton #13716
Conversation
Some perf numbers on V100 for end-to-end JSON reading. Overall slight improvements due to saving an extra pass over the data.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. 🚀
If there is an article explaining "CUB-style implementation" on TempStorage
, it will be useful. It's great if in a future PR, this is changed simpler functor.
/merge |
Description
This PR simplifies and cleans up the JSON reader's pushdown automaton.
The pushdown automaton takes as input two arrays:
{
-JSON object
,[
-JSON array
,_
-Root of JSON
)Previously, we were fusing the two arrays and materializing them straight to the symbol group id for each combination. A symbol group id serves as the column of the transition table. The symbol group ids array was then used as input to the finite state transducer (FST).
After the recent refactor of the FST lookup tables, the FST has become more flexible. It now supports arbitrary iterators and the symbol group id lookup table (that maps a symbol to a symbol group id) can now be implemented by a simple function object.
This PR takes advantage of the FST's ability to take fancy iterators. We now zip the
json_input
andstack_context
symbols and pass thatzip_iterator
to the FST.Checklist